Base-Delta-Immediate Compression: A Practical Data Compression Mechanism for On-Chip Caches

نویسندگان

  • Gennady Pekhimenko
  • Vivek Seshadri
  • Onur Mutlu
  • Todd C. Mowry
  • Phillip B. Gibbons
  • Michael A. Kozuch
چکیده

Cache compression is a promising technique to increase cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency. In this paper, we propose a new compression algorithm called Base-Delta-Immediate (B∆I) compression, a practical technique for compressing data in on-chip caches. The key idea of the algorithm is that, for many cache lines, the values within the cache line have a low dynamic range – i.e., the differences between values stored within the cache line are small. As a result, a cache line can be represented using a base value and an array of differences whose combined size is much smaller than the original cache line (we call this the base+delta encoding). Moreover, many cache lines intersperse such base+delta values with small values – our B∆I technique efficiently incorporates such immediate values into its encoding. Compared to prior cache compression approaches, our studies show that B∆I strikes a sweet-spot in the tradeoff between compression ratio, decompression/compression latencies, and hardware complexity. Our results show that B∆I compression improves performance for both single-core (8.1% improvement) and multi-core workloads (9.5% / 11.2% improvement for two/four cores). For many applications, B∆I provides the performance benefit of doubling the cache size of the baseline system, effectively increasing average cache capacity by 1.53X.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Data Compression for Modern Memory Hierarchies

Although compression has been widely used for decades to reduce file sizes (thereby conserving storage capacity and network bandwidth when transferring files), there has been limited use of hardware-based compression within modern memory hierarchies of commodity systems. Why not? Especially as programs become increasingly data-intensive, the capacity and bandwidth within the memory hierarchy (i...

متن کامل

Improvement in performance of Chip-multiprocessor using Effective Dynamic Cache Compression Scheme

Abstract— Chip Multiprocessors (CMPs) combine multiple cores on a single die, typically with private level-one caches and a shared level-two cache. The gap between processor and memory speed is alleviated primarily by using caches. However, the increasing number of cores on a single chip increases the demand on a critical resource: the shared L2 cache capacity. In this dissertation work , a los...

متن کامل

NoΔ: Leveraging delta compression for end-to-end memory access in NoC based multicores

As the number of on-chip processing elements increases, the interconnection backbone bears bursty traffic from memory and cache accesses. In this paper, we propose a compression technique called No∆, which leverages delta compression to compress network traffic. Specifically, it conducts data encoding prior to packet injection and decoding before ejection in the network interface. The key idea ...

متن کامل

Determining the Proper compression Algorithm for Biomedical Signals and Design of an Optimum Graphic System to Display Them (TECHNICAL NOTES)

In this paper the need for employing a data reduction algorithm in using digital graphic systems to display biomedical signals is firstly addressed and then, some such algorithms are compared from different points of view (such as complexity, real time feasibility, etc.). Subsequently, it is concluded that Turning Point algorithm can be a suitable one for real time implementation on a microproc...

متن کامل

Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches

With the widening gap between processor and memory speeds, memory system designers may find cache compression beneficial to increase cache capacity and reduce off-chip bandwidth. Most hardware compression algorithms fall into the dictionary-based category, which depend on building a dictionary and using its entries to encode repeated data values. Such algorithms are effective in compressing lar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012